Goto

Collaborating Authors

 sketch understanding


SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction

Neural Information Processing Systems

Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce $\texttt{SEVA}$, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm (Vinker et al., 2022) capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.


ProHD: Projection-Based Hausdorff Distance Approximation

Fu, Jiuzhou, Guo, Luanzheng, Tallent, Nathan R., Zhao, Dongfang

arXiv.org Artificial Intelligence

The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose \textbf{ProHD}, a projection-guided approximation algorithm that dramatically accelerates HD computation while maintaining high accuracy. ProHD identifies a small subset of candidate "extreme" points by projecting the data onto a few informative directions (such as the centroid axis and top principal components) and computing the HD on this subset. This approach guarantees an underestimate of the true HD with a bounded additive error and typically achieves results within a few percent of the exact value. In extensive experiments on image, physics, and synthetic datasets (up to two million points in $D=256$), ProHD runs 10--100$\times$ faster than exact algorithms while attaining 5--20$\times$ lower error than random sampling-based approximations. Our method enables practical HD calculations in scenarios like large vector databases and streaming data, where quick and reliable set distance estimation is needed.


RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

Zhao, Bin, Garg, Nakul

arXiv.org Artificial Intelligence

Millimeter-wave radar provides perception robust to fog, smoke, dust, and low light, making it attractive for size, weight, and power constrained robotic platforms. Current radar imaging methods, however, rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves 35 cm Chamfer Distance and 28 cm Modified Hausdorff Distance, improving over the single-frame RadarHD baseline (56 cm, 45 cm) and remaining competitive with multi-frame methods using 5-41 frames. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish the first practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.


Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS

Nazate-Burgos, Paola, Torres-Torriti, Miguel, Aguilera-Marinovic, Sergio, Arévalo, Tito, Huang, Shoudong, Cheein, Fernando Auat

arXiv.org Artificial Intelligence

Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.


Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji

Taele, Paul, Hammond, Tracy

arXiv.org Artificial Intelligence

Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor - level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long - term kanji learning.


SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction

Neural Information Processing Systems

Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce \texttt{SEVA}, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task.


Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

Forbus, Kenneth D., Chen, Kezhen, Xu, Wangcheng, Usher, Madeline

arXiv.org Artificial Intelligence

One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, which combines computer vision components into an ensemble to produce sketch-like entities which are then further processed by CogSketch, our model of high-level human vision, to produce both more detailed shape representations and scene representations which can be used for data-efficient learning via analogical generalization. This paper describes our theoretical framework, summarizes several previous experiments, and outlines a new experiment in progress on diagram understanding.


DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation

Timofeev, Aleksandr, Fadeeva, Anastasiia, Afonin, Andrei, Musat, Claudiu, Maksai, Andrii

arXiv.org Artificial Intelligence

As text generative models can give increasingly long answers, we tackle the problem of synthesizing long text in digital ink. We show that the commonly used models for this task fail to generalize to long-form data and how this problem can be solved by augmenting the training data, changing the model architecture and the inference procedure. These methods use contrastive learning technique and are tailored specifically for the handwriting domain. They can be applied to any encoder-decoder model that works with digital ink. We demonstrate that our method reduces the character error rate on long-form English data by half compared to baseline RNN and by 16% compared to the previous approach that aims at addressing the same problem. We show that all three parts of the method improve recognizability of generated inks. In addition, we evaluate synthesized data in a human study and find that people perceive most of generated data as real.


Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer

Lee, Hakjin, Song, Minki, Koo, Jamyoung, Seo, Junghoon

arXiv.org Artificial Intelligence

The Detection Transformer (DETR) has emerged as a pivotal role in object detection tasks, setting new performance benchmarks due to its end-to-end design and scalability. Despite its advancements, the application of DETR in detecting rotated objects has demonstrated suboptimal performance relative to established oriented object detectors. Our analysis identifies a key limitation: the L1 cost used in Hungarian Matching leads to duplicate predictions due to the square-like problem in oriented object detection, thereby obstructing the training process of the detector. We introduce a Hausdorff distance-based cost for Hungarian matching, which more accurately quantifies the discrepancy between predictions and ground truths. Moreover, we note that a static denoising approach hampers the training of rotated DETR, particularly when the detector's predictions surpass the quality of noised ground truths. We propose an adaptive query denoising technique, employing Hungarian matching to selectively filter out superfluous noised queries that no longer contribute to model improvement. Our proposed modifications to DETR have resulted in superior performance, surpassing previous rotated DETR models and other alternatives. This is evidenced by our model's state-of-the-art achievements in benchmarks such as DOTA-v1.0/v1.5/v2.0, and DIOR-R.


Gromov-Hausdorff Distances for Comparing Product Manifolds of Model Spaces

Borde, Haitz Saez de Ocariz, Arroyo, Alvaro, Morales, Ismael, Posner, Ingmar, Dong, Xiaowen

arXiv.org Artificial Intelligence

Recent studies propose enhancing machine learning models by aligning the geometric characteristics of the latent space with the underlying data structure. Instead of relying solely on Euclidean space, researchers have suggested using hyperbolic and spherical spaces with constant curvature, or their combinations (known as product manifolds), to improve model performance. However, there exists no principled technique to determine the best latent product manifold signature, which refers to the choice and dimensionality of manifold components. To address this, we introduce a novel notion of distance between candidate latent geometries using the Gromov-Hausdorff distance from metric geometry. We propose using a graph search space that uses the estimated Gromov-Hausdorff distances to search for the optimal latent geometry. In this work we focus on providing a description of an algorithm to compute the Gromov-Hausdorff distance between model spaces and its computational implementation.